The Best 28 Pose Estimation Tools in 2025
Superpoint
Other
SuperPoint is a self-supervised trained fully convolutional network for interest point detection and description.
Pose Estimation
Transformers

S
magic-leap-community
59.12k
13
Vitpose Base Simple
Apache-2.0
ViTPose is a human pose estimation model based on Vision Transformer, achieving 81.1 AP accuracy on the MS COCO keypoint test set, with advantages such as model simplicity, scalable size, and flexible training.
Pose Estimation
Transformers English

V
usyd-community
51.40k
20
Vitpose Plus Small
Apache-2.0
ViTPose++ is a vision Transformer-based human pose estimation model, achieving outstanding performance of 81.1 AP on the MS COCO keypoint detection benchmark.
Pose Estimation
Transformers

V
usyd-community
30.02k
2
Vitpose Plus Base
Apache-2.0
ViTPose is a vision Transformer-based human pose estimation model that achieves an outstanding performance of 81.1 AP on the MS COCO keypoint detection benchmark with a simple design.
Pose Estimation
Transformers English

V
usyd-community
22.26k
10
Superglue Outdoor
Other
SuperGlue is a graph neural network-based feature matching model for matching interest points in images, suitable for image matching and pose estimation tasks.
Pose Estimation
Transformers

S
magic-leap-community
18.39k
2
Vitpose Plus Huge
Apache-2.0
ViTPose++ is a vision Transformer-based foundational model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Pose Estimation
Transformers

V
usyd-community
14.49k
6
Img2pose
img2pose is a Faster R-CNN-based model for predicting the six degrees of freedom (6DoF) pose of all faces in a photo and projecting 3D faces onto a 2D plane.
Pose Estimation
I
py-feat
4,440
0
Vitpose Plus Large
Apache-2.0
ViTPose++ is a vision Transformer-based foundation model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Pose Estimation
Transformers

V
usyd-community
1,731
1
Synthpose Vitpose Huge Hf
Apache-2.0
SynthPose is a keypoint detection model based on the VitPose huge backbone network, fine-tuned with synthetic data to predict 52 human keypoints, suitable for kinematic analysis.
Pose Estimation
Transformers

S
stanfordmimi
1,320
1
Sapiens Pose 1b Torchscript
Sapiens is a vision Transformer model pre-trained on 300 million 1024x1024 resolution human images, specifically designed for high-precision pose estimation tasks.
Pose Estimation English
S
facebook
1,245
7
Synthpose Vitpose Base Hf
Apache-2.0
SynthPose is a 2D human pose estimation model based on VitPose Base, fine-tuned with synthetic data, capable of predicting 52 anatomical keypoints
Pose Estimation
Transformers

S
stanfordmimi
931
3
Reloc3r 512
Reloc3r is a concise and efficient camera pose estimation framework that combines a pretrained dual-view relative camera pose regression network with a multi-view motion averaging module.
Pose Estimation
R
siyan824
840
4
Vitpose Base
Apache-2.0
A vision Transformer-based human pose estimation model achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set
Pose Estimation
Transformers English

V
usyd-community
761
9
Lightglue Superpoint
Other
LightGlue is an efficient keypoint detection and matching model for feature matching and pose estimation problems in computer vision.
Pose Estimation
Transformers

L
ETH-CVG
316
20
Reloc3r 224
Reloc3r is a large-scale relative camera pose regression model for visual localization, featuring generalization, speed, and precision.
Pose Estimation
Safetensors
R
siyan824
172
2
Vitpose Base Simple
This is a keypoint detection model based on transformers, used to identify keypoint positions in images
Pose Estimation
Transformers

V
nielsr
109
1
Sapiens Pose Bbox Detector
Apache-2.0
The RTMDet detector is a high-efficiency detector specifically designed for the Sapiens pose estimation model, intended for human keypoint detection tasks.
Pose Estimation
S
facebook
107
3
Sapiens Pose 1b
Pose-Sapiens-1B is a high-resolution human pose estimation model based on the Vision Transformer architecture, pre-trained on 300 million 1024x1024 resolution human images, supporting 308 keypoint detections (body, face, hands, and feet).
Pose Estimation English
S
facebook
82
4
Poseless 3B
Apache-2.0
Poseless-3B is a vision-language model (VLM)-based robotic hand control framework that directly maps 2D images to joint angles without explicit pose estimation.
Pose Estimation
Transformers

P
Menlo
65
10
Sapiens Pose 0.3b Torchscript
Sapiens is a vision Transformer model pre-trained on 300 million high-resolution human images, specifically designed for pose estimation tasks, supporting 308 keypoint detection.
Pose Estimation English
S
facebook
55
1
Vitpose Base Coco Aic Mpii
Apache-2.0
ViTPose is a human pose estimation model based on Vision Transformer, achieving outstanding performance on benchmarks like MS COCO through simple architectural design.
Pose Estimation
Transformers English

V
usyd-community
38
1
Vitpose Base Simple
A lightweight pose estimation model based on ViT architecture for human keypoint detection
Pose Estimation
Transformers

V
onnx-community
31
3
Sapiens Pose 1b Bfloat16
Sapiens is a vision transformer series model pre-trained on 300 million 1024x1024 resolution human images, focusing on human-centric vision tasks.
Pose Estimation English
S
facebook
31
0
Sapiens Pose 0.6b Torchscript
Sapiens is a vision Transformer model pre-trained on 300 million high-resolution human images, specifically designed for pose estimation tasks, supporting 308 keypoint detection.
Pose Estimation English
S
facebook
29
0
Diffusion Pusht Keypoints
Apache-2.0
A robot control model trained using Diffusion Policy, specifically designed for PushT tasks, utilizing keypoint observation data for training
Pose Estimation
Transformers

D
lerobot
21
0
Vitpose Base Simple
Apache-2.0
ViTPose is a baseline model for human pose estimation based on plain vision transformers, achieving high-performance keypoint detection with a simple architecture
Pose Estimation
Transformers English

V
danelcsb
20
1
Sapiens Pose 0.6b
Sapiens is a family of vision Transformer models pre-trained on 300 million high-resolution human images, focusing on human-centric vision tasks.
Pose Estimation English
S
facebook
19
2
Vitpose
This model is used to detect keypoints in images or videos, suitable for tasks such as human pose estimation and facial landmark detection.
Pose Estimation
Transformers

V
shauray
19
0